Language model estimations and representations for real-time continuous speech recognition

نویسندگان

Giuliano Antoniol

Fabio Brugnara

Mauro Cettolo

Marcello Federico

چکیده

This paper compares different ways of estimating bigram language models and of representing them in a finite state network used by a beam-search based, continuous speech, and speaker independent HMM recognizer. Attention is focused on the n-gram interpolation scheme for which seven models are considered. Among them, the Stacked estimated linear interpolated model favourably compares with the best known ones. Further, two different static representations of the search space are investigated: “linear” and “tree-based”. Results show that the latter topology is better suited to the beam-search algorithm. Moreover, this representation can be reduced by a network optimization technique, which allows the dynamic size of the recognition process to be decreased by 60%. Extensive recognition experiments on a 10,000-word dictation task with four speakers are described in which an average word accuracy of 93% is achieved with real-time response.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Language modelling for efficient beam-search

This paper considers the problems of estimating bigram language models and of efficiently representing them by a finite state network, which can be employed by an hidden Markov model based, beam-search, continuous speech recognizer. A review of the best known bigram estimation techniques is given together with a description of the original Stacked model. Language model comparisons in terms of p...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

A 40-NM 54-MW 3×-real-time VLSI processor for 60-kWord continuous speech recognition

This paper describes a low-power VLSI chip for speakerindependent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). We implement parallel and pipelined architecture for GMM computation and Viterbi processing. It includes a 8-path Viterbi transition architecture to maximize the processing speed and adopts tri-gram language model to improve the recogni...

متن کامل

VLSI Architecture of GMM Processing and Viterbi Decoder for 60, 000-Word Real-Time Continuous Speech Recognition

We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1994

Language model estimations and representations for real-time continuous speech recognition

نویسندگان

چکیده

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Language modelling for efficient beam-search

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A 40-NM 54-MW 3×-real-time VLSI processor for 60-kWord continuous speech recognition

VLSI Architecture of GMM Processing and Viterbi Decoder for 60, 000-Word Real-Time Continuous Speech Recognition

عنوان ژورنال:

اشتراک گذاری